Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

Copy link
Contributor Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 14, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3899

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3899

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3899

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3899

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3899

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3899

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3899

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3899

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3899

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3899

commit: efdfa84

@MasterPtato MasterPtato force-pushed the 01-14-fix_engine-runner_handle_shutdown_close_codes_correctly branch from 600206d to 55a38d5 Compare January 14, 2026 22:45
@MasterPtato MasterPtato force-pushed the 01-14-fix_pegboard_cache_none_runner_pool_errors_too branch 2 times, most recently from 19b864f to 166bf69 Compare January 14, 2026 22:47
@MasterPtato MasterPtato force-pushed the 01-14-fix_engine-runner_handle_shutdown_close_codes_correctly branch from 55a38d5 to 715b5e8 Compare January 14, 2026 22:47
@claude
Copy link

claude bot commented Jan 14, 2026

PR Review: fix(pegboard): cache none runner pool errors too

Summary

This PR fixes a caching issue where runner pools without errors (i.e., active_error = None) were not being cached. The change ensures both error and non-error states are cached, with a filtering step to return only actual errors to callers.

Code Quality ✅

Positive aspects:

  • Clean separation of concerns with the new RunnerPoolErrorCacheEntry struct for internal caching
  • Proper use of filter_map to transform cached entries to public API format
  • Maintains backward compatibility - the public API (RunnerPoolErrorEntry) remains unchanged
  • Good use of Option to represent the presence/absence of errors

Adherence to CLAUDE.md:

  • ✅ Uses structured logging correctly (tracing::warn!, tracing::error!)
  • ✅ Proper error handling with Result types
  • ✅ Uses workspace dependencies appropriately
  • ✅ Follows naming conventions (snake_case, descriptive names)

Potential Issues

1. Cache Inconsistency for Missing Workflows ⚠️

Location: Lines 90-95

There's a subtle bug: when find_workflows returns None for a runner (workflow doesn't exist), that runner is silently skipped and not added to the cache. This means:

for (workflow_id, (namespace_id, runner_name)) in workflow_ids.into_iter().zip(runners.iter()) {
    if let Some(workflow_id) = workflow_id {
        // Only tracked runners get cached
        workflow_to_runner.insert(workflow_id, (*namespace_id, runner_name.clone()));
        workflow_ids_to_fetch.push(workflow_id);
    }
    // Missing: runners without workflows are not cached at all
}

Impact: If a runner pool has never had the error tracker workflow created, subsequent calls won't benefit from caching and will repeatedly query workflows that don't exist.

Suggested fix:

// After line 101, before workflows fetch
let mut result = Vec::new();

// First, add entries for runners that have no workflow at all
for (workflow_id_opt, (namespace_id, runner_name)) in workflow_ids.iter().zip(runners.iter()) {
    if workflow_id_opt.is_none() {
        result.push(RunnerPoolErrorCacheEntry {
            namespace_id: *namespace_id,
            runner_name: runner_name.clone(),
            error: None,
        });
    }
}

// Then proceed with existing workflow fetching logic...

2. Cache Poisoning on Parse Errors ⚠️

Location: Lines 119-127

When state parsing fails, the code logs an error and continues, which means that runner is not added to the cache result. This causes:

  • Cache miss for that specific runner
  • Repeated attempts to parse broken state on every call
  • Wasted database/workflow queries

Suggested approach: Cache parse errors as None to avoid repeated failed parsing attempts:

Err(err) => {
    tracing::error!(
        %namespace_id,
        %runner_name,
        ?err,
        "failed to parse error tracker state"
    );
    // Cache as None to avoid repeated parsing attempts
    result.push(RunnerPoolErrorCacheEntry {
        namespace_id: *namespace_id,
        runner_name: runner_name.clone(),
        error: None,
    });
    continue;
}

Performance Considerations

Good: The 500ms TTL is appropriate for error states that need to be relatively fresh
Good: Caching non-error states reduces unnecessary workflow queries
⚠️ Concern: The two issues above mean some entries won't be cached, reducing cache effectiveness

Security Concerns

✅ No security issues identified. Error information is internal to the system.

Test Coverage

Missing: No tests found in the pegboard package

  • Consider adding tests for:
    • Caching behavior when errors are present
    • Caching behavior when errors are absent (None)
    • Behavior when workflows don't exist
    • Behavior when state parsing fails
    • Filtering logic that removes None errors from results

Minor Observations

Line 98: Early return when workflow_ids_to_fetch.is_empty() means runners without workflows won't be cached (see Issue #1 above)

Recommendations

  1. High Priority: Fix cache inconsistency for runners without workflows (Issue [SVC-2555] Set up issue templates #1)
  2. Medium Priority: Cache parse errors as None to avoid repeated failures (Issue [SVC-2479] Send cluster events to PostHog #2)
  3. Low Priority: Add unit/integration tests for this caching logic
  4. Consider: Add a comment explaining why RunnerPoolErrorCacheEntry uses Option<RunnerPoolError> while RunnerPoolErrorEntry doesn't

Verdict

The PR successfully achieves its goal of caching non-error states, which is a good improvement. However, there are two edge cases where entries won't be cached, which partially defeats the purpose of the change. These should be addressed to maximize cache effectiveness.

Recommendation: Request changes to handle missing workflows and parse errors in the cache.

@MasterPtato MasterPtato force-pushed the 01-14-fix_engine-runner_handle_shutdown_close_codes_correctly branch from 715b5e8 to d6cf484 Compare January 14, 2026 22:52
@MasterPtato MasterPtato force-pushed the 01-14-fix_pegboard_cache_none_runner_pool_errors_too branch 2 times, most recently from 7ae571e to 9d5adb3 Compare January 14, 2026 23:02
@MasterPtato MasterPtato force-pushed the 01-14-fix_engine-runner_handle_shutdown_close_codes_correctly branch 2 times, most recently from 13e2511 to ce0f27d Compare January 14, 2026 23:07
@MasterPtato MasterPtato force-pushed the 01-14-fix_pegboard_cache_none_runner_pool_errors_too branch from 9d5adb3 to 868b254 Compare January 14, 2026 23:07
@MasterPtato MasterPtato force-pushed the 01-14-fix_pegboard_cache_none_runner_pool_errors_too branch from 868b254 to efdfa84 Compare January 14, 2026 23:39
@MasterPtato MasterPtato force-pushed the 01-14-fix_engine-runner_handle_shutdown_close_codes_correctly branch from ce0f27d to 53bd400 Compare January 14, 2026 23:39
@graphite-app
Copy link
Contributor

graphite-app bot commented Jan 14, 2026

Merge activity

  • Jan 14, 11:40 PM UTC: MasterPtato added this pull request to the Graphite merge queue.
  • Jan 14, 11:41 PM UTC: CI is running for this pull request on a draft pull request (#3908) due to your merge queue CI optimization settings.
  • Jan 14, 11:42 PM UTC: Merged by the Graphite merge queue via draft PR: #3908.

@graphite-app graphite-app bot closed this Jan 14, 2026
@graphite-app graphite-app bot deleted the 01-14-fix_pegboard_cache_none_runner_pool_errors_too branch January 14, 2026 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants